The major objectives of this training wil be to;
Provide an understand the basics of R programming language and its relevance to local government operations.
Learn how to manipulate and analyze data using R.
Improve proficiency in generating visualizations to communicate insights effectively.
Boost confidence in the use of R language as a routine tool like other tools such as excel, and word document.
Develop the skills to undertake data-driven initiatives within the council.
R is a programming language commonly used for statistical computing and graphics. It is free, open-source, and provides a vast array of statistical and graphical techniques for data analysis and visualization. Users can perform data manipulation, modeling, and visualization tasks efficiently using R.
RStudio is an integrated development environment (IDE) for R programming. It provides a user-friendly interface for writing, executing, and debugging R code. RStudio includes features such as syntax highlighting, code completion, and built-in help documentation to support R programming. It also offers tools for managing projects, viewing data, and generating visualizations, making it a comprehensive environment for data analysis and R programming.
Within the local council R language can be used to perform various tasks such as managing payrolls in calculating salaries and generating payroll reports, production of documents that include text and visuals, analyzing and visualizing council to identify trends, patterns, and insights that can be used to inform policy decisions within the council.
R download
Visit this link https://www.r-project.org/. Click where the red arrow is pointing to in the following images..
Note; For macOS users, you also need to download Quartz at Quart.
R studio download
Visit the following link (https://posit.co/) and follow the following procedures graphically displayed
After downloading and installing R and R studio, open R studio. The following interface is going to be displayed.
The source editor is where you write and edit your R code. It provides features such as syntax highlighting (color-coding different elements of the code), automatic indentation, and code completion. You can open multiple script files in separate tabs for easier organization and navigation of your code.
The console is an interactive environment where R commands are executed and output is displayed. You can type commands directly into the console, and R will execute them immediately, showing the results. It also displays error messages, warnings, and other messages generated by R.
The environment pane displays information about the objects (variables, datasets, functions, etc.) currently in your R session. It shows the names, types, and values of objects, and allows you to interact with them (e.g., remove objects, import datasets). The environment pane helps you keep track of your workspace and manage objects effectively.
The history pane shows a history of the commands that have been executed in the console. It provides a record of previous commands, allowing you to review, re-execute, or modify them as needed. You can filter the history by date or keyword to find specific commands quickly.
The files pane allows you to navigate the files and directories on your computer. You can browse, open, edit, and save files directly from within RStudio without using an external file explorer. It supports various file types, including R scripts, text files, CSV files, and more.
The plots pane displays graphical plots generated by R code. When you create plots using functions like plot() or ggplot(), they are shown in the plots pane. You can interact with plots (e.g., zooming, panning) and export them as images or PDF files.
The packages pane provides information about R packages that are installed on your system. It lists all installed packages, along with their version numbers and descriptions. You can install, update, and remove packages using buttons and commands in the packages pane.
The help pane provides access to R documentation and help files. You can search for specific functions, packages, or topics, and view detailed information about them. It includes descriptions, usage examples, arguments, and references to related functions or topics.
The viewer pane displays various types of content such as HTML files, images, and interactive visualizations. It is commonly used to view HTML reports generated by RMarkdown documents, interactive plots, and other web-based content. The viewer pane allows for interactive exploration of output and results within RStudio.
Open Source: Freely available for download, use, and modification.
Extensive Statistical Capabilities: Comprehensive suite of statistical and graphical techniques.
Rich Visualization Tools: Packages like ggplot2 offer high-quality, customizable plots.
Active and Supportive Community: Large community provides support.
Reproducibility and Documentation: Facilitates reproducible research practices.
In R, objects are similar to containers used to store data, much like a box where you store your belongings. These objects can hold various types of data, such as numbers, text, or even more complex structures like data frames or lists. When you create an object in R, you give it a name of your choice, and you can then use that name to refer to the data stored within it.
To create an object in R, you use the assignment operator, which can be either <- or =, to assign a value or the result of an expression to a name. Here’s a simple example:
In this example, 5 + 10 is evaluated, and the result (15) is stored in an object named a.
Once you’ve created an object, you can access its contents by simply typing the name of the object. R will then display the value stored within that object. Additionally, you can use the print() function to explicitly display the contents of an object. For example:
-- [1] 15
-- [1] 15
Both of these methods will show the value of the object a, which in this case is 15. You can find all objects crearted stored within the environment once you click the environment pane.
Data types are classifications that specify the nature of the data stored in objects. Understanding data types is crucial for effective data manipulation and analysis. Here’s a brief description of the main data types in R:
Numeric data types represent numerical values, including integers and real numbers (floating-point numbers).
# Below is a floating or double numeric data
num1 <- 3.15
# Below is an integer numeric data
num2 <- 2
# You can use class to check what type of data you have
class(num1)-- [1] "numeric"
Character data types represent text data enclosed in quotation marks.
Logical data types represent binary values indicating true or false.
Factor data types represent categorical data with predefined levels. Factors are created using the factor() function.
factor <- factor(c("low", "medium", "high"))
# To see what value factors as assigned to each level, you can use str
str(factor)-- Factor w/ 3 levels "high","low","medium": 2 3 1
Date and time data types represent specific points in time or durations. Date objects are created using the Date() function, while time objects can be created using the POSIXct or POSIXlt classes.
date <- as.Date("2023-12-31")
time <- as.POSIXct("2023-12-31 12:00:00")
# Using str to check the data type
str(date)-- Date[1:1], format: "2023-12-31"
-- POSIXct[1:1], format: "2023-12-31 12:00:00"
In R, objects can hold various types of data, allowing for flexible and versatile data storage and manipulation. There are five main types of objects:
Vectors
Matrices
Arrays
Lists
Dataframes
A vector is the simplest form of R object. it takes on only one form of data type.
# Numeric vector (integer)
vector1 <- c(1, 2, 3, 4, 5)
# Numeric vector (folating numbers)
vector2 <- c(1.43, 5.45, 6.98)
# Character vector
character_vector <- c("Diamond", "Gold", "Silver")
# Logical vector
logical_vector <- c(TRUE, FALSE, TRUE, FALSE)
# Factor vector representing categorical data
factor_vector <- factor(c("Agree", "strongly agree", "diagree", "strongly disagree"))A matrix is a two-dimensional array that stores data in rows and columns. It is a special case of a vector object with two dimensions, where all elements are of the same data type. Matrices are useful for organizing data in a tabular format, similar to a spreadsheet.
# Create a 3x3 matrix with numeric values
matrix1 <- matrix(data = c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3)
matrix1-- [,1] [,2] [,3]
-- [1,] 1 4 7
-- [2,] 2 5 8
-- [3,] 3 6 9
# Create a 2x4 matrix with character values
matrix2 <- matrix(data = c("a", "b", "c", "d", "e", "f", "g", "h"), nrow = 2, ncol = 4)
matrix2-- [,1] [,2] [,3] [,4]
-- [1,] "a" "c" "e" "g"
-- [2,] "b" "d" "f" "h"
# Create a 2x2 matrix with logical values
matrix3 <- matrix(data = c(TRUE, FALSE, FALSE, TRUE), nrow = 2, ncol = 2)
matrix3-- [,1] [,2]
-- [1,] TRUE FALSE
-- [2,] FALSE TRUE
In R, an array is a multi-dimensional extension of a matrix, allowing for more than two dimensions. Like matrices, arrays store data in a structured format, but they can have multiple dimensions, making them suitable for representing higher-dimensional data.
# Create a 3-dimensional array with numeric values
array1 <- array(data = c(1, 2, 3, 4, 5, 6, 7, 8, 9), dim = c(3, 3, 2))
array1-- , , 1
--
-- [,1] [,2] [,3]
-- [1,] 1 4 7
-- [2,] 2 5 8
-- [3,] 3 6 9
--
-- , , 2
--
-- [,1] [,2] [,3]
-- [1,] 1 4 7
-- [2,] 2 5 8
-- [3,] 3 6 9
# Create a 2-dimensional array with character values
array2 <- array(data = c("a", "b", "c", "d", "e", "f"), dim = c(2, 3))
array2-- [,1] [,2] [,3]
-- [1,] "a" "c" "e"
-- [2,] "b" "d" "f"
# Create a 4-dimensional array with logical values
array3 <- array(data = c(TRUE, FALSE, TRUE, FALSE), dim = c(2, 2, 2, 2))
array3-- , , 1, 1
--
-- [,1] [,2]
-- [1,] TRUE TRUE
-- [2,] FALSE FALSE
--
-- , , 2, 1
--
-- [,1] [,2]
-- [1,] TRUE TRUE
-- [2,] FALSE FALSE
--
-- , , 1, 2
--
-- [,1] [,2]
-- [1,] TRUE TRUE
-- [2,] FALSE FALSE
--
-- , , 2, 2
--
-- [,1] [,2]
-- [1,] TRUE TRUE
-- [2,] FALSE FALSE
Accessing a specific element of an array
-- [1] 8
-- [1] "f"